Development of maximum relevant prior feature ensemble (MRPFE) index to characterize future drought using global climate models

Drought is one of the foremost outcomes of global warming and global climate change. It is a serious threat to humans and other living beings. To reduce the adverse impact of drought, mitigation strategies as well as sound projections of extreme events are essential. This research aims to strengthen the robustness of anticipated twenty-first century drought by combining different Global Climate Models (GCMs). In this article, we develop a new drought index, named Maximum Relevant Prior Feature Ensemble index that is based on the newly proposed weighting scheme, called weighted ensemble (WE). In the application, this study considers 32 randomly scattered grid points within the Tibetan Plateau region and 18 GCMs of Coupled Model Intercomparison Project Phase 6 (CMIP6) of precipitation. In this study, the comparative inferences of the WE scheme are made with the traditional simple model averaging (SMA). To investigate the trend and long-term probability of various classes, this research employs Markov chain steady states probability, Mann–Kendall trend test, and Sen’s Slope estimator. The outcomes of this research are twofold. Firstly, the comparative inference shows that the proposed weighting scheme has greater efficiency than SMA to conflate GCMs. Secondly, the research indicates that the Tibetan Plateau is projected to experience “moderate drought (MD)” in the twenty-first century.

drought risk due to rapid changes in the global warming environment.It is posing significant threats to pollution levels 11 , agriculture, livestock 12 , ecosystems, and human health 13 .Nevertheless, effective planning and drought moderation techniques can mitigate these adverse effects to some extent.Drought moderation techniques include methods like rainwater harvesting, building reservoirs and dams, and planting drought-resistant crops.
A Global Climate Model (GCM) is a mathematical depiction of the Earth's atmosphere designed to simulate the Earth's climate system and ocean circulations 14,15 .It is essential for climate studies, as it enables the refinement of our comprehension and projecting capabilities regarding the atmosphere 16 , ocean, and climate behavior 17 .GCM simulations are commonly employed for assessing future drought risks.These GCM simulations have been utilized by various researchers, including [18][19][20] .Numerous researchers have utilized hydrological data to evaluate drought indices for estimating drought severity.Examples include the Palmer Drought Severity Index (PDSI) 21 , the Normalized Ecosystem Drought Index (NEDI) 22 , and the Standardized Precipitation Index (SPI) 23 .Similarly, 24 also utilized precipitation data of GCMs to study future conditions of drought.GCM models exhibit significant biases, particularly for the variables influencing hydrology.
The assessment of climate variables in the Coupled Model Intercomparison Project phase 5 (CMIP5) and CMIP6 models is prone to a certain amount of uncertainty and fluctuation.CMIP5 and CMIP6 simulate Earth's climate system and are used to project future climate scenarios under different greenhouse gas emission scenarios 24 .Numerous authors have found that it is challenging to accurately estimate extreme hydrological occurrences because of the uncertainty of climate projection models 25 .Using a single climate model decreases the reliability of the results in analyzing meteorological events 26 .Numerous researchers suggested that using ensemble models might help to reduce the uncertainty 27,28 .
Drought has been evaluated in a range of climate simulation scenarios using the ensemble approach many times.For instance, 29 employed statistical and machine-learning techniques to construct an ensemble approach for thirty-four CMIP5 climate models 30 .Utilized the multi-model ensemble and the delta method to project future temperature changes.Ruan et al. 31 examined potential fluctuations in temperature, precipitation, and drought characteristics using the CMIP5 optimum ensemble of GCMs.However, biases and some estimation errors are inherent in every model ensemble, lowering the reliability of models 32 .Hence, to accurately estimate drought conditions, it is necessary to use methods that can project droughts efficiently.This ensures a more comprehensive understanding of drought dynamics.Therefore, this research aims to propose a comprehensive technique for studying future drought conditions for the time 2015-2100 at different time scales (i.e., 1, 3, 6, 9, 12, 24, and 48 months).Time scales help to evaluate which type of drought it is.Short-term precipitation deficits indicate meteorological drought, medium-term soil moisture deficits point to agricultural drought, and longterm reductions in water bodies signal hydrological drought.Socioeconomic drought can encompass various time scales, depending on its impact on society and the economy.
The resulting Maximum Relevant Prior Feature Ensemble (MRPFE) drought index allows for efficient and accurate drought estimations.

Data and methods
The methods and study areas utilized in this study briefly explained in the following sub-sections.However, 32,33 utilized the same study area for studying drought.Whereas, the standardization procedure is selected, based on 34 research.Moreover, 24 also studied the long-term statistic of precipitation by using Steady state probabilities.

K-component Gaussian mixture distribution (K-CGMD)
The Standardized Drought Index (SDI) is a crucial measure for assessing drought severity.Fitting an appropriate probability distribution to time series precipitation data is a key point in measuring the SDI.The meteorological variables data follow a multimodal distribution, which means that the distribution of the data has more than one peak.The current SDI estimation methodology is based on an unimodal distribution.In such cases, insufficient distribution fitting reduces the accuracy of drought assessment.In past research, unimodal distributions were commonly employed to compute drought indices such as the Standardized Precipitation Index (SPI) 23 and standardized Precipitation Evapotranspiration Index (SPEI) 35 .However, these indices are multimodal 5 .On another aspect, multimodal distributions can improve computational accuracy.The R packages 'fitdistrplus' and 'propagate' are used to select the appropriate probability function in this study.Recently, 36,37 fitted 32 probability distributions for the calculation of various drought indices using the 'R' package.Ali et al. 5 used K-CGMD based on a standardization method to model precipitation time series and achieve the highest computational accuracy.K-CGMD is a type of mixture model that has been used in a variety of studies to simulate various random events 24,37 .Mathematically, the K-CGMDs are presented as: (1) where k denotes the quantity and number of components, w i specifies the weight of the mixture component of ith element with the restraint k i=1 r i = 1 .s i andq i show the mean and variance of the ith component 24 .

Steady-state probability of Markov chain
A discrete stochastic process (Markov chain) describes a possible sequence of events 38 .Markov chain models can be used to predict the probabilities of incoming process states.It plays an important role in projecting future droughts.To assess drought conditions of different climatic regions, several authors used Markov chain stochastic process models, including; [39][40][41][42] .The Markov chain, Transition Probability Matrix (TPM), and steady-state probabilities are explained briefly below: Let Z = {z 1 , z 2 .....z r } be the possible process states.The process may begin in one of these states and move sequentially from one to another state.If the current position of the chain is in the state z i , then it proceeds to the next step, by passing to the state z j with probability P ij .The TPM provides the probabilities of changing states 42 .TPM assumes that it is always in a square matrix, where rows show the initial state and columns show the next state.Each element of TPM is a probability, which means all values are nonnegative (0 ≤ P ij ≤ 1) and the sum of rows is equal to 1.
The following conditions satisfy each formulated TPM.
for all i and j.These probabilities are expressed in matrix form as follows. Let ij be the number of transitions, in which z i (initial state) transit to state z j (Next state).The different states of the transition probabilities are: TPM is a square matrix with elements that are both real and non-negative, which are as follows: The above-mentioned matrix's elements assess the transient probabilities in the process state space.The stationary probabilities of the process quantify the long-term behavior of the process states.Such types of probabilities are known as steady-state probabilities.After a certain number of steps, a Markov process's probabilities tend towards a stable steady state.Let p j represent the limiting probability of i th step after "n" steps.The math- ematical definition of the steady-state probability is defined below.

In another way,
The criteria by which MRPFE index values are categorized into different drought classes are provided in 24 .

Mann-Kendall (MK) test
Mann-Kendall (MK) trend test has several applications in environmental and hydrological research and uses test statistics for assessing trends in time series data 43 .The MK trend test identifies statistically significant increasing or decreasing trends in long-term temporal data and detects climate trends in meteorological and hydrological time series data 44 .Several researchers have employed the MK test to identify trends, for example, 45 used the modified MK test to detect trends in annual precipitation and temperature for nine states of the northeastern United States.The modified MK test is a statistical method that is built upon the original MK trend test by incorporating adjustments or enhancements to better suit specific research contexts. 46analyzed the long-term spatio-temporal variations in rainfall from 1901 to 2015 in India using the MK test to identify the pattern of precipitation (rainfall).Vicente-Serrano t al. 36 employed the MK test to determine the monthly and annual patterns (trends) of the Yangtze River flows at the Zhutuo and Cuntan stations of China over 35 years (1980-2015).Praveen et al. 47 employed the MK test to identify potential trends and analyze monthly and annual trends in streamflow, rainfall, and temperature within the Urmia Lake (UL) basin over 42 years from 1971 to 2013.The MK test can be mathematically described as follows: where sign Y j − Y i of Eq. ( 9) can be calculated by using Eq. (10).( 4) www.nature.com/scientificreports/ The positive S values represent an upward trend, negative values indicate a downward trend, and zero signifies the absence of a trend.The following test statistics are formulated to appraise trends within the complete time series data.
In Eq. (11), Var(S) can be calculated by the following equation: where m is the difference in the number of compared values and n represents the overall amount of data points.

Sen's slope estimator (SSE)
Sen's Slope Estimator (SSE) serves as a nonparametric statistical test widely utilized for determining trend magnitudes in time series data 48 .SSE finds application in hydro-meteorological time series for both trend analysis 44 , and the prediction of trend magnitude 49,50 .SSE has been employed in several studies to gain insight into time series data trends.For example, 51 examined spatiotemporal trends in annual rainfall utilizing SSE.Harka et al. 52 calculated trends in the time series data of identified COVID-19 cases in India using SSE.Additional applications are found by 51,[53][54][55][56] .Sen's Slope Estimator (SSE) was introduced by 57 .A brief mathematical description of SSE is as follows: where Z t and Z i denote data values at times t and i, respectively in the context of t > i, J i signifies the slope of the estimator between the data points Z t and Z i .Here t varies from 2 to n and i varies from 1 to N − 1 , and n denotes the total number of data points in the temporal data.
For an individual datum in every period, there will be N = n(n − 1)/2 slope estimates.For several observa- tions in one or more periods, then N\n(n − 1)/2.
The median of n values of Q i calculated by the following equation: The Positive value of Q i indicate an upward (increasing) trend, while a negative value of Q i indicate a down- ward (decreasing) trend.

Application
This research applies temporal data of precipitation from 18 climate models of CMIP6, emphasizing 32 grid points located on the Tibetan Plateau which national territory of China.However, we selected these models are grid points by following the study of 24 .Tibetan plateau encompasses an area of more than 2.5 million km 2 (26.00-39.47N, 73.19-104.47E), this is the world's largest plateau 58 .This region is also called the "world water tower" 59 .The Tibetan Plateau is a region in southwest China, and a large number of Asian rivers originate there.However, the region of the Tibetan Plateau is prone to global warming and climate change 60 .The temperature on the Tibetan Plateau has significantly increased over the last few decades 61 .So, it is beneficial to measure and assess drought with respect to global warming and climate change.Several researchers have performed spatial-temporal analyses associated with drought forecasting, assessment, and monitoring, in Tibetan Plateau regions.Including; [62][63][64][65][66] .In this study, we use simulated monthly time series of precipitation data of the CMIP6 models, which range from 1961 to 2014.We utilized CN05.1 model data as the observational data set of precipitation 32 .In addition, we utilize three different future scenarios i.e., SSP1-2.6,SSP2-4.5, and SSP5-8.5.Information on the selected models is available in 24 .

The proposed method
In this section, we used precipitation data ranging from 1961 to 2014 from several GCMs corresponding to CN05.1 as observational data.Moreover, this section describes the process involved in the development of the MRPFE index.Figure 1 shows the flowchart of the MRPFE index.Here, the proposed weighting scheme aims to reduce the impact of extreme values on the aggregated data.The mathematical expressions for the suggested weighting scheme for combining precipitation time series data obtained from various CMIP6 models are shown in this section.The proposed weighting scheme distinguishes itself by giving more weight to those values whose (10)   frequency contributes more to homogeneity among them.In contrast, divergent values will be given lower weights.This implicates minimization of the impact of extreme values in the aggregation process.The proposed weighting scheme is implemented in the following steps: Let D ∈ ( Z 1 ,Z 2 , Z 3 ,….Z K ) be the time series data of precipitation simulated by various models in a specific region.And y is the observed time series data.Where K shows the total number of GCMs.The primary goal of the weighting scheme is to reduce the effect of extreme values.Our proposed scheme is based on three major phases.The explanation of each phase is described below: Phase 1. Weighting each model This phase assigns each model weight based on its difference from observed values.Below is a description of these steps: Step 1.The absolute difference between simulated and observed In this stage, we are taking the absolute difference of data simulated from GCM models (Z i ) and observed data y: Step 2. Combining observed and simulated data In this stage, we are adding the absolute of y to the absolute of each model value: www.nature.com/scientificreports/ Step 3. Assigning weights This stage assigns weights to each model by taking the ratio of Eqs. ( 16) and ( 17): Here, T i will be the weight of ith model.
Step 4. Standardization of weights This stage standardizes the weights assigned to each model.For standardization, each assigned weight is divided by the sum of the weights of each model: Here S i represents the standardized weights for each model.Phase 2. Assigning spatiotemporal weights to each value This phase assigns values of each model a relevant weight based on its location and time.This phase includes the following steps: Step 1. Exponentials of the absolute differences As a 1 in Eq. ( 16) represents the difference in the observed value and ith GCM data, the first step of this phase then suggests calculating the exponent of ith differences.This equation aims to maximize the differences.
Step 2. Estimation of weights In this step, we assign a high weight to small deviated values and a low weight to large deviated values: Step 3. Standardizing weights In this stage, we standardize p i, as follows: Under certain conditions that K i=1 w i = 1.Phase 2 is iterated for each model.

Phase 3. Hybridization of Phases 1 & 2
To combine both weights, we are taking an average of Eqs. ( 19) and ( 22): Here P i are representing the proposed weights and we name this weighting technique "Weighted Ensemble".Phase 4. Data aggregation This phase aggregates the data of various CMIP6 GCM simulations under proposed weights: After the aggregation of data, now we perform multiple linear regression models for future projections.After this, we will standardize the P ct under K-CGMD, this is the 12-component combined Cumulative Distribution Function (CDF): Here, we selected 12 components as there are 12 months in a year.In Eq. 25, M(x) is the CDF of K-CGMD.To standardize this CDF for the calculation of the proposed drought index MRPFRE, the following method is applied: where Here h = +1 , when C o = 2.515517, C 1 = 0.802853, C 2 = 0.010328, q 1 = 1.432788,q 2 = 0.985269, and q 3 = 0.001308 are constants.This index and the included constants were developed based on the spatiotemporally weighted combination of precipitation time series.It is named the Maximum Relevant Prior Feature Ensemble (MRPFE) index.

Comparative statistics Simple model averaging (SMA)
Simple model averaging (SMA) is a type of simple mean that gives equal importance to each value in the dataset 67 , which has been used many times to combine GCM ensembles 29,68 .In this study, it is applied as a comparative method to the proposed weighting scheme.The calculation is based on the following equation: where S(t) is the SMA of GCMs, P i is the precipitation projection for the ith GCM and k is the number of GCMs.

Relative absolute error (RAE) and mean absolute error (MAE)
Relative Absolute Error (RAE), a statistical tool, assesses the accuracy and precision of projections relative to a reference value.RAE is calculated by dividing the absolute difference between the predicted and the reference value.In contrast, MAE, another statistical performance metric, represents the average of the absolute differences between predicted and corresponding reference values.These methods are frequently used in various recent studies 69 . 24,69have explained these methods mathematically in their studies.

Estimation of weights of the proposed index
In this study, a novel weighting index is proposed to address biases and reduce the impact of extreme precipitation values by placing greater emphasis on values that deviate less from observational data.Figure 2 presents the selected locations of the Tibetan Plateau.A temporal representation of observed and simulated models data is shown in Fig. 3. Table 1 provides the resolution of each selected model and summary statistics of weights assigned to all selected GCMs at one random point.The GCMs and their corresponding weights are listed in rows, and the columns show the minimum, maximum, and average weights assigned to each GCM.It can be observed that the average weights assigned to the GCMs range from 0.046 to 0.054.The minimum and maximum weights assigned to each GCM also vary, with some GCMs having weights as low as 0.023 and as high as 0.057.The maximum average weight (0.054) is assigned to MPI-ESM1-2-LR, minimum average weight (0.046) to CNRM-CM6-1.Overall, the table provides useful information about the weights assigned to the GCMs and their relative importance in the ensemble.Furthermore, Fig. 4 displays the assigned weights of each CMIP6 model at one random grid point.Table 2 shows the monthly weights that are assigned to all selected GCMs at one random point.From the table, we can see that the maximum weights were assigned to the ACCESS-ESM1-5 model in December and, the minimum to the MPI-ESM1-2-LR model in July.

Validation of the proposed WE scheme
Table 3 shows the summary statistics of MAE and RAE of the WE weighting scheme and SMA technique.The results show that the errors of the WE scheme are significantly less than SMA scheme.Based on these findings, it can be stated that our proposed weighting scheme is more efficient than the traditional SMA scheme.

Estimation of MRPFE using K-CGMD under a different scenario
In this section, we examine the effectiveness of K-CGMD for modeling drought index values.The accuracy of K-CGMD at different time scales is compared with different univariate probability distributions using the Bayesian Information Criterion (BIC).BIC is used to evaluate models and determine which trade-off between model fit and complexity is optimal.A more favorable model fit is indicated by lower BIC values.Table 4 represents the BIC values for univariate distributions and the K-CGMD model for various time scales of three different scenarios.The findings reveal that, for SSP1-2.6,SSP2-4.5, and SSP5-8.5, the K-CGMD model consistently exhibits lower BIC values across all time scales compared to unimodal.This consistent pattern recommends that the K-CGMD model is a superior fit for the data for most of the time scales within all three scenarios.Consequently, the K-CGMD model proves to be a more reliable and effective approach for standardizing drought indices compared to unimodal distributions.In Fig. 5, probability and Q-Q plots visually demonstrate K-CGMD's superiority over unimodal probability models in modeling drought indices for scenario SSP1-2.6.Furthermore,

Trend assessment under Mann-Kendall and Sen's slope
The significance of future drought trends and direction is evaluated using Sen's slope and seasonal Mann-Kendall approaches 36 .This specific method is used to evaluate the drought conditions on the Tibetan Plateau.Table 5 presents trend analyses utilizing Mann-Kendall and Sen's slope methods for three scenarios across various time scales.Each scenario and time scale combination includes the Kendall Z-value, Sen's slope, p-value, significance level, and trend direction.The direction of the trend, whether it is increasing or decreasing, is determined by the trend analysis.Additionally, the p-value i.e., p < 0.05 establishes the significance level of the trend in this analysis.
The results indicate a predominantly decreasing trend across all time scales for SSP1-2.6, with an exception at time scale 1, which exhibits an increasing trend.For SSP2-4.5, the trend is mostly increasing for smaller time scales (Scale-1, Scale-3, and Scale-6) and decreasing for larger time scales (Scale-24 and Scale-48).For SSP5-8.5, the trend is consistently decreasing for all time scales and is statistically significant for all except for Scale-1.www.nature.com/scientificreports/

Estimation of drought using steady-state probabilities
The steady-state probability of the Markov chain is employed in this study to measure the long-term impact of random events.Precipitation is classified into seven classes, namely Extreme Wet (EW), Severe Wet (SW), Moderate Wet (MW), Near Normal (NN), Moderate Drought (MD), Severe Drought (SD), and Extreme Drought (ED).Table 6, evaluates the steady-state probabilities for the various drought classes under three different scenarios at each time scale.From this information, we conclude that the probability of ED is greater than the probability of EW the probability of MD is less than the probability of MW, and the probabilities SD is greater than the probabilities of SW at SSP1-2.6.After analyzing the probabilities of SSP2-4.5 and SSP5-8.5 across different time scales we noticed that the probability of dry conditions is more probable than that of wet conditions.It is anticipated that in all locations, drought conditions will be more prevalent than wet conditions.

Discussion
The analysis of the proposed MRPFE drought index and the novel weighting scheme WE demonstrate significant improvements in the accuracy of drought estimation.The application of the WE scheme to the CMIP6 dataset shows a marked reduction in both comparative measures compared to the SMA approach.CMIP models have various uncertainties and this unpredictability is an essential part of climate modeling.Understanding these uncertainties is integral for making informed decisions regarding climate change mitigation.By utilizing MME, probabilistic approaches, and transparent communication, we can better manage uncertainties and enhance the robustness of climate projections.Variations in the analysis results of drought trends under different scenarios arise from the varying assumptions and projections related to future greenhouse gas emissions, land use changes, and other socio-economic factors.Each scenario reflects a different trajectory of human activity and its impact on the climate system, leading to variations in the projected severity, frequency, and spatial distribution of droughts.And the guiding significance of utilizing different emission scenarios is to help policymakers and to get insight into better risk management.The proposed index employs the K-CGMD and Markov Chain steady-state probability analysis, provides a robust framework for estimating the likelihood of various drought states.The results indicate that the MRPFE index effectively captures the temporal dynamics of drought conditions, offering a more nuanced understanding of drought trends in this region.Furthermore, the study's findings highlight the importance of considering multiple emission scenarios when projecting future drought conditions.The variations in drought trends observed under different scenarios underscore the influence of future greenhouse gas emissions, land use changes, and other socio-economic factors on drought severity, frequency, and spatial distribution.This insight is crucial for policymakers and stakeholders involved in climate change mitigation and adaptation planning.

Conclusion
Drought is a naturally occurring phenomenon, that is caused by irregularities in climate variables such as precipitation patterns.There are numerous ecological causes concerned with classifying drought conditions at the particular monitoring station.Therefore, proper pattern processing methods are required to project and investigate the periodic data about the occurrences of drought classes.This study provides a novel weighting scheme, "WE" to combine multiple models and a new drought index "MRPFE" to project drought.The novel weighting scheme   www.nature.com/scientificreports/ the Markov Chain approach.The MAE and RAE have been used as relative measures to assess the performance of the proposed weighting scheme.The comparative inference shows that the proposed weighting scheme has greater efficiency than SMA in combining GCMs.Looking ahead, the findings suggest that the Tibetan Plateau region may experience increase in frequency of drought due to declining pattern of precipitation. https://doi.org/10.1038/s41598-024-66804-5

Figure 2 .
Figure 2. The geographical locations of the selected study area.

Figure 4 .
Figure 4. Weights assigned to each model at one random location.

Figure 5 .
Figure 5. Probability and QQ-plot of univariate and K-CGMM for SSP1-2.6 future scenarios at time scale-1 on 76.5° E and 36° N location.

Figure 6 .
Figure 6.Probability and QQ-plot of univariate and K-CGMM for SSP2-4.5 future scenarios at time scale-1 on 76.5° E and 36° N location.

Figure 7 .
Figure 7. Probability and QQ-plot of univariate and K-CGMM for SSP5-8.5 future scenarios at time scale-1 on 76.5° E and 36° N location.

Table 1 .
Resolution of models and summary statistics of weights assigned to each GCM at one random point. www.nature.com/scientificreports/

Table 4 .
BIC of Unimodal and K-CGMD at different time scales for three different scenarios at one location.